Improve schemas for non-total TypedDict#11
Conversation
bentsku
left a comment
There was a problem hiding this comment.
LGTM, this looks good! Nice fix to a tricky solution, a None type hint can mean a lot for non-total typed dicts, so I think this a good way to solve that 👍
| class PyType(TypedDict, total=False): | ||
| name: str | ||
| age: int | None | ||
| opt: Opt | None |
There was a problem hiding this comment.
might be worth to add a str | None here just to verify what happens when we do Union[str, None, str]?
There was a problem hiding this comment.
Done in c514ae0. Union can't have duplicates, so it would just be Union[str, | None]. If you actually try to create a type alias like this, you'd get a Optional[str] :)
| "type": "string", | ||
| }, | ||
| {"name": "age", "type": ["long", "null", "string"]}, | ||
| {"name": "opt", "type": [{"namedString": "Opt", "type": "string"}, "null"]}, |
There was a problem hiding this comment.
question: so here the regular str is not added, so would that work if we add the special marker to a Opt enum type? seems like yes as your comment in the deduplication logic
There was a problem hiding this comment.
Yes, it would. {"namedString": "Opt", "type": "string"} is totally equivalent to "type": "string" for Avro.
5e4135e to
c514ae0
Compare
TypedDictsin Python can be marked withtotal=None. This means that omitting a field still results in a valid structure.It gets trickier when non-total
TypedDictshave also nullable fields.For instance, imagine the following structure:
The resulting schema would be something like that:
{ "type": "record", "name": "PyType", "fields": [ { "name": "height", "type": [ "long", "null" ] }, { "name": "weight", "type": [ "long", "null" ] } ] }Now, imagine the two following objects.
The schema would fit both the Avro representation, but does not let us distinguish the case in which
weightis explicitly set toNonefrom the case in which it is not set at all.With this PR, we extend the types of the fields of a non-total TypedDict with
string. This is going to be useful for all clients to mark a field as explicitly missing. Eveyrhing would make more sense if you look at the mentioned PR using this feature.Addresses PNX-545